AITopics | model segment

Collaborating Authors

model segment

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

QPART: Adaptive Model Quantization and Dynamic Workload Balancing for Accuracy-aware Edge Inference

Li, Xiangchen, Ghafouri, Saeid, Ji, Bo, Vandierendonck, Hans, John, Deepu, Nikolopoulos, Dimitrios S.

arXiv.org Artificial IntelligenceJul-1-2025

As machine learning inferences increasingly move to edge devices, adapting to diverse computational capabilities, hardware, and memory constraints becomes more critical. Instead of relying on a pre-trained model fixed for all future inference queries across diverse edge devices, we argue that planning an inference pattern with a request-specific model tailored to the device's computational capacity, accuracy requirements, and time constraints is more cost-efficient and robust to diverse scenarios. To this end, we propose an accuracy-aware and workload-balanced inference system that integrates joint model quantization and inference partitioning. In this approach, the server dynamically responds to inference queries by sending a quantized model and adaptively sharing the inference workload with the device. Meanwhile, the device's computational power, channel capacity, and accuracy requirements are considered when deciding. Furthermore, we introduce a new optimization framework for the inference system, incorporating joint model quantization and partitioning. Our approach optimizes layer-wise quantization bit width and partition points to minimize time consumption and cost while accounting for varying accuracy requirements of tasks through an accuracy degradation metric in our optimization model. To our knowledge, this work represents the first exploration of optimizing quantization layer-wise bit-width in the inference serving system, by introducing theoretical measurement of accuracy degradation. Simulation results demonstrate a substantial reduction in overall time and power consumption, with computation payloads decreasing by over 80% and accuracy degradation kept below 1%.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2506.23934

Country:

North America > United States (0.14)
Europe > Poland (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices

Zhao, Junchen, Song, Yurun, Liu, Simeng, Harris, Ian G., Jyothi, Sangeetha Abdu

arXiv.org Artificial IntelligenceDec-1-2023

Deploying Large Language Models (LLMs) locally on mobile devices presents a significant challenge due to their extensive memory requirements. In this paper, we introduce LinguaLinked, a system for decentralized, distributed LLM inference on mobile devices. LinguaLinked enables collaborative execution of the inference task across multiple trusted devices. LinguaLinked ensures data privacy by processing information locally. LinguaLinked uses three key strategies. First, an optimized model assignment technique segments LLMs and uses linear optimization to align segments with each device's capabilities. Second, an optimized data transmission mechanism ensures efficient and structured data flow between model segments while also maintaining the integrity of the original model structure. Finally, LinguaLinked incorporates a runtime load balancer that actively monitors and redistributes tasks among mobile devices to prevent bottlenecks, enhancing the system's overall efficiency and responsiveness. We demonstrate that LinguaLinked facilitates efficient LLM inference while maintaining consistent throughput and minimal latency through extensive testing across various mobile devices, from high-end to low-end Android devices. In our evaluations, compared to the baseline, LinguaLinked achieves an inference performance acceleration of $1.11\times$ to $1.61\times$ in single-threaded settings, $1.73\times$ to $2.65\times$ with multi-threading. Additionally, runtime load balancing yields an overall inference acceleration of $1.29\times$ to $1.32\times$.

inference, lingualinked, mobile device, (17 more...)

arXiv.org Artificial Intelligence

2312.00388

Country:

North America > United States > California > Orange County > Irvine (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Decentralized Federated Learning: A Segmented Gossip Approach

Hu, Chenghao, Jiang, Jingyan, Wang, Zhi

arXiv.org Machine LearningAug-21-2019

The emerging concern about data privacy and security has motivated the proposal of federated learning, which allows nodes to only synchronize the locally-trained models instead their own original data. Conventional federated learning architecture, inherited from the parameter server design, relies on highly centralized topologies and the assumption of large nodes-to-server bandwidths. However, in real-world federated learning scenarios the network capacities between nodes are highly uniformly distributed and smaller than that in a dat-acenter. It is of great challenges for conventional federated learning approaches to efficiently utilize network capacities between nodes. In this paper, we propose a model segment level decentralized federated learning to tackle this problem. In particular, we propose a segmented gossip approach, which not only makes full utilization of node-to- node bandwidth, but also has good training convergence. The experimental results show that even the training time can be highly reduced as compared to centralized federated learning.

artificial intelligence, machine learning, model segment, (17 more...)

arXiv.org Machine Learning

1908.07782

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Multiscale Random Fields with Application to Contour Grouping

Latecki, Longin J., Lu, Chengen, Sobel, Marc, Bai, Xiang

Neural Information Processing SystemsDec-31-2009

We introduce a new interpretation of multiscale random fields (MSRFs) that admits efficient optimization in the framework of regular (single level) random fields (RFs). It is based on a new operator, called append, that combines sets of random variables (RVs) to single RVs. We assume that a MSRF can be decomposed into disjoint trees that link RVs at different pyramid levels. The append operator is then applied to map RVs in each tree structure to a single RV. We demonstrate the usefulness of the proposed approach on a challenging task involving grouping contours of target shapes in images. MSRFs provide a natural representation of multiscale contour models, which are needed in order to cope with unstable contour decompositions. The append operator allows us to find optimal image labels using the classical framework of relaxation labeling, Alternative methods like Markov Chain Monte Carlo (MCMC) could also be used.

artificial intelligence, assignment, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback